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Abstract 

This  paper  describes  results  using  the  wavelet 
transform  to  preprocess  acoustic  broadband  signals  in 
a  system  that  discriminates  between  different  classes 
of  acoustic  bursts.  This  is  motivated  by  the  similarity 
between  the  proportional  bandwidth  filters  provided 
by  the  wavelet  transform  and  those  found  in  biolog¬ 
ical  hearing  systems.  The  experiment  involves  com¬ 
paring  statistical  pattern  classifier  effects  of  wavelet 
and  FFT  preprocessed  acoustic  signals.  The  data  used 
was  from  the  “DARPA  Phase  1”  database,  which  con¬ 
sists  of  artificially  generated  signals  with  real  ocean 
background.  The  results  show  that  the  wavelet  trans¬ 
form  did  provide  improved  performance  when  classi¬ 
fying  in  a  frame-by-frame  basis.  The  DARPA  Phase 
1  database  is  well  matched  to  proportional  bandwidth 
filtering;  i.e.,  signal  classes  that  contain  high  frequen¬ 
cies  do  tend  to  have  shorter  duration  in  this  database. 
It  is  also  noted  that  the  decreasing  background  lev¬ 
els  at  high  frequencies  compensate  for  the  poor  match 
of  the  wavelet  transform  for  long  duration  (high  fre¬ 
quency)  signals. 

1  Introduction 

An  ocean  acoustic  event  classification  system  in¬ 
cludes  a  pre-processor,  a  frame-level  classifier,  and 
higher  level  decision  logic.  For  known  signal  in  back¬ 
ground,  there  are  a  number  of  ways  to  optimize  at 
each  stage  to  maximize  overall  detection/classification 
probabilities.  For  short  duration  ocean  acoustic 
events,  however,  we  look  for  algorithms  that  are  ro¬ 
bust  under  different  training  conditions.  The  goal  of 
this  study  is  to  compare  wavelet  and  Fourier  prepro¬ 
cessing  for  a  system  with  the  same  classifier,  in  order 
to  identify  the  characteristics  in  the  pre-processor  that 
lead  to  good  overall  performance. 

There  are  a  number  of  reasons  for  studying  the 
wavelet  transform.  Humans  are  excellent  acoustic 
event  classifiers.  The  wavelet  transform  provides  a 
proportional  bandwidth  (bandwidth  increases  in  pro¬ 
portion  with  center  frequency),  similar  to  the  filters  in 
the  human  ear.  There  is  considerable  progress  in  the 
wavelet  field.  This  means  that  systems  that  use  the 
wavelet  transform  is  supported  by  a  rich  understand¬ 
ing  of  the  different  aspects  of  the  wavelet  transform. 
Finally,  the  wavelet  transform  project^  a  signal  into 
a  multi-resolution  space  that  is  useful  for  image  pro¬ 
cessing  [1],  speech  coding  [2],  and  sound  pattern  anal¬ 


ysis  [3].  For  ocean  acoustics,  this  means  that  wavelet 
transforms  may  be  able  to  provide  a  small  number  of 
relevant  parameters  for  the  classifiers  —  a  property 
that  usually  leads  to  good  overall  performance. 

Nicolas  has  performed  an  extensive  comparative 
study  between  wavelet,  FFT,  Wigner,  and  other  pro¬ 
cessors  for  a  database  of  short  duration  ocean  acoustic 
events  [4].  For  the  DARPA  Phase  I  dataset,  Desai  [5] 
used  wavelet  a  transform,  with  sophisticated  feature 
extraction,  to  attain  0%  error.  Beck  [6]  performed 
comparative  evaluation  between  wavelet  and  FFT  pre¬ 
processing  for  neural  net  classification,  and  found  that 
wavelets  lead  to  better  performance  for  this  database. 
The  contribution  we  make  here  is  to  offer  some  ex¬ 
planations  as  to  why  the  wavelet  transform  seem  to 
lead  to  better  performance  on  the  DARPA  Phase  I 
database.  Section  2  will  provide  a  brief  introduction 
to  the  wavelet  transform,  which  will  leave  out  much 
of  the  mathematical  properties  but  focus  on  the  prop¬ 
erties  that  we  feel  are  important.  Section  3  describes 
our  approach  to  this  problem.  Section  4  describes  the 
DARPA  Phase  I  database  and  the  experimental  per¬ 
formed.  Section  4.1  discusses  the  experimental  results. 

2  Background 

Since  the  wavelet  transform  will  be  compared  to  the 
Fourier  transform,  perhaps  the  best  way  to  introduce 
the  wavelet  transform  is  to  compare  and  contrast  it 
with  the  Short  Time  Fourier  Transform  (STFT)  in 
the  problem  of  time-frequency  localization. 

The  STFT  estimate  of  a  signal,  /(f),  in  the  time- 
frequency  domain,  for  the  mth  frequency  and  nth  time 
component,  is  defined  as 

Stftl^- "]  =  /  -  nAt)mdt  (1) 

where 

/(f)  =  the  input  signal, 

git)  =  a  window, 
mAw  =  frequency  shift,  and 
nAf  =  time  shift. 

In  other  words,  Cgf^f  is  a  discrete  Fourier  component 
of  a  windowed  input.  g(t)  is  a  window,  and  is  required 
to  have  finite  support  in  both  the  time  and  frequency 
domain,  and  centered  around  f  =  0  and  u>  =  0  in  the 


Table  1 :  Contrasts  between  STFT  and  wavelet  trans¬ 
form. 


STFT 

g{t)  is  centered  at  <  = 
0,  w  =  0  with  compact 

support _ 

‘^stfti’^-"i  samples  f{t) 
at  u;  =  mAu,  t  =  nAt 
same 

bandwidth  and  time  res¬ 
olution  for  all  m  and  n, 
the  bandwidth  and  time 
resolution  of  g(t) _ 


WAVELET 

Sis  centered  at  t  =  0, 

H(w  =  0)  =  0,  and 
H('})  is  centered  at  kp 
cwirn,  nj  samples  f{t)  at 

u>  =  ifco/a'",  t  =  bn/a*" 

bandwidth  scales  as 
a"'{BW),  time  window 
scales  as  {Length) / 
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Figure  2;  The  sampling  locations,  bandwidth  and 
time  window  length  for  wavelet  transform  in  the  time- 
frequency  domain. 


Figure  1;  The  sampling  locations,  bandwidth  and  time 
window  length  for  STFT  in  the  time-frequency  do¬ 
main. 

time  and  frequency  domain  (in  order  to  get  resolution 
in  time  and  frequency).  The  Wavelet  transform,  on 
the  other  hand,  is  dehned  as 


where 

h{t)  =  wavelet  function, 
o'"  =  a  dilation  factor,  and 

nbfa"'  =  a  time  shift. 

Here,  h{t),  the  wavelet  function,  is  required  to  have 
finite  support  in  both  time  and  frequency,  centered 
around  t  =  0  in  the  time  domain,  and  centered  around 
w  =  jfco  ^  0  in  the  frequency  domain.  The  reason  for 
the  requirement  of  it  being  not  centered  at  0  in  the 
frequency  domain  is  that  frequency  shifting  is  achieved 
by  scaling  to  achieve  a  center  frequency  of  w  =  ko/a”* , 
and  no  frequency  shifting  can  be  achieved  if  ko  =  0. 

The  differences  between  the  wavelet  transform  and 
STFT  are  summarized  in  Table  1,  and  illustrated  in 
Figs.  1  and  2.  The  main  point  is  that  wavelet  trans¬ 
form  provides  proportional  bandwidths,  with  wider 
b^uldwidth/bigher  time  resolution  at  high  frequencies, 
and  finer  bandwidth/wide  time  windows  at  lower  fre¬ 
quencies. 

To  see  why  the  wavelet  is  a  biological  model,  con¬ 
sider  the  frequency  response  of  the  cochlea  of  a  cat, 


rr*qu«ncy  (Hz) 

Figure  3:  Frequency  response  of  the  hair  cells  at  dif¬ 
ferent  parts  of  the  cochlea  of  a  cat.  After  [Bertrand, 
1989] 


Figure  4:  “Mel-scaled”  filter  bank:  Linear  spaced  tri¬ 
angular  filters  between  0  and  IkHz,  and  logarithmic 
increments  in  both  frequency  and  bandwidth  after¬ 
wards. 


■SPLfT 


Figure  5.  Octave  band  coder. 
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shown  in  Fig.  3.  The  biological  ear,  as  in  the  case  of 
wavelet,  provide  wider  bandwidth  in  higher  frequency 
than  in  lower  frequency.  This  fact  is  taken  advantage 
of  in  mauiy  speech  processing  front-ends.  For  instance, 
the  “mel-scaled”  cepstral  coefficients  (Fig.  4  leads  to 
improved  speech  recognition  performance  [7] .  The  oc¬ 
tave  band  speech  coder,  used  for  speech  compression 
[2],  has  the  same  time-frequency  resolution  and  sam¬ 
pling  properties  as  the  wavelet  transform  (Fig.  5). 

2.0.1  Lnplementation 

The  octave  band  coder,  shown  in  Fig.  5,  represents  the 
way  the  decimated  wavelets  are  implemented  in  prac¬ 
tice.  The  wavelet  function  is  designed  to  pass  frequen¬ 
cies  from  w  =  jr/2  to  IT.  At  each  stage  or  “octave” ,  the 
wavelet  function  extracts  the  upper  half  of  the  band¬ 
width.  The  remaining  signal  (extracted  with  a  simi¬ 
lar  low-pass  filter)  is  decimated  and  then  re-applied 
to  the  same  set  of  filters.  To  improve  the  frequency 
resolution  in  the  high  frequency  extraction  step,  the 
high  frequency  filter  is  divided  into  several  high  fre¬ 
quency,  non-overlapping  narrowband  filter  that  col¬ 
lectively  constitute  the  high  frequency  filter  —  and 
thereby  creating  several  “voices” .  The  “tree  wavelet” 
[8],  or  “wavelet  packets”  [9]  seeks  to  improve  the  reso¬ 
lution  in  the  high  frequency  filter  by  splitting  not  only 
the  low  frequency  component  at  the  next  octave,  but 
also  split  the  high  frequency  signal. 

3  Approach 

Performance  comparison  studies  [10,  11]  have 
demonstrated  that  a  number  of  pattern  classifiers  will 
produces  results  near  the  Bayes’  optimal  value.  There¬ 
fore,  for  the  purpose  of  this  study,  evaluation  will  be 
restricted  to  the  quadratic  Gaussian  classifier.  Eval¬ 
uation  will  be  performed  based  on  frame-level  clas¬ 
sification  results.  Recent  studies  have  demonstrated 
that  good  frame-level  recognition  rates  do  not  auto¬ 
matically  extend  to  good  event-level  recognition.  In 
[12],  for  instance,  classifiers  that  provided  best  frame- 
level  performance  did  not  lead  to  good  segmental- 
level  performance.  However,  for  the  purpose  of  this 
study,  it  was  felt  that  there  is  a  strong  correlation  be¬ 
tween  the  frame-level  scores  and  segmental  level  scores 
so  that  we  can  make  a  valid  interpretation  from  the 
frame-level  scores.  Here,  the  segmentation  of  data 
samples  (assigning  class  labels  to  pre-processor  frame 
outputs)  is  done  by  Hidden  Markov  Models  fHMM’s) 
and  forced  Viterbi  decoding,  where  the  HMM’s  were 
trained  on  the  training  set.  Therefore,  the  segmenta¬ 
tion  boundaries  are  those  that  would  lead  to  the  high¬ 
est  recognition  rates  on  both  the  training  and  testing 
set. 

4  Experiments 

The  six  signals  of  the  DARPA  Phase  I  dataset  was 
processed  by  the  conventional  FFT  (summarized  in 
Table  2),  and  wavelets  (Table  3V  The  frame-by-frame 
output  of  the  pre-processors  nein^  studied  are  then 
fed  into  a  quadratic  Gaussian  classifier. 

Table  4  shows  the  percentage  errors  resulting  from 
the  Gaussian  classifier  for  the  various  pre-processors. 


Table  2:  FFT  Pre-processing  schemes.  NAVG  is 
the  number  of  FFT  frames  averaged  together,  DATA 
WINDOW  is  the  amount  of  data  that  each  frame  of 
output  examines.  FRAME  SIZE  is  the  dimensionality 
of  data  supplied  to  the  classifier. 


id 

fft 

size 

%over- 

lap 

navg 

data 

window 

frame 

size 

ft  128 

128 

50 

’1 

5  msec 

65“ 

ftl28avg3 

128 

50 

3 

10msec 

65 

ft256 

256 

75 

1 

10msec 

129 

ft64avg4 

64 

50 

4 

6.25msec 

33 

Table  3:  Wavelet  Pre-processing  schemes  for  the  dec¬ 
imated  wavelet  and  the  tree  wavelet.  The  Morlet 
wavelet  is  a  modulated  Gaussian  pulse.  Daubechies 
wavelet  is  a  4  point  orthogonal  wavelet  function.  AT 
is  the  time  interval  between  wavelet  outputs. 


id 

#  octaves 

#  voices 

wavelet 

decwvlt 

7" 

4 

Morlet 

tree 

1 

8 

Daubechies 

id 

window 

ST 

frame  size 

decwvlt 

168.4msec 

.5msec 

. 

tree 

2.56msec 

.04msec 

17 

Table  4:  Final  classification  percentage  errors. 


PRE-PROCESSOR 

PERCENTM!E  ERROR 

decwvlt 

Oi 

ftl28 

7.93 

ftl28avg3 

8.98 

ft256 

2.58 

ft64avg4 

6.37 

tree 

23.40 

Table  5:  Percentage  confusion  matrix  for  DECWVLT. 


TRUE 

CLASS 

# 

ES’llMATED  CTrASS 
A  B  C  D 

E 

# 

99.89 

^02^ 

.08 

A 

0 

100 

0 

0 

0 

0 

B 

0 

0 

100 

0 

0 

0 

C 

0 

0 

0 

100 

0 

0 

D 

0 

0 

0 

0 

100 

0 

E 

.19 

0 

0 

0 

0 

99.81 
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Table  10:  Percentage  confusion  matrix  for  TREE. 


Table  6:  Percentage  confusion  matrix  for  FT128. 


TRUE  ■ 
CLASS 

# 

ESTIMATED  CLASS 

A  B  CD 

E 

# 

"505“ 

(J 

8.35 

A 

0 

100 

0 

0 

0 

0 

B 

31.25 

0 

56.15 

0 

0 

12.5 

C 

0 

0 

0 

100 

0 

0 

D 

0 

0 

0 

0 

97.62 

2.38 

E 

6.96 

0 

0 

0 

0 

93.04 

Table  7:  Percentage  confusion  matrix  for 

FT128AVG3. 


TRUE 

CLASS 

# 

ESTIMATED  CLASS 
A  B  C  D 

E 

# 

88.77 

^15^ 

^31^ 

10.77 

A 

0 

100 

0 

0 

0 

0 

B 

40 

0 

60 

0 

0 

0 

C 

0 

0 

0 

100 

0 

0 

D 

0 

0 

0 

0 

100 

0 

E 

3.70 

0 

0 

0 

0 

96.30 

Table  8:  Percentage  confusion  matrix  for  FT64AVG4. 


TRUE 

CLASS 

# 

ESTIMATED  CLASS 
A  B  C  D 

E 

# 

94.87 

4.90 

A 

0 

100 

0 

0 

0 

0 

B 

0 

0 

100 

0 

0 

0 

C 

0 

0 

0 

100 

0 

0 

D 

0 

0 

0 

0 

100 

0 

E 

9.15 

0 

0 

0 

0 

90.85 

Table  9:  Percentage  confusion  matrix  for  FT256. 


TRUE 

CLASS 

# 

ESTIMATED  CLASS — 
A  BCD 

E 

# 

98.3 

.14 

1.56 

A 

14.29 

85.71 

0 

0 

0 

0 

B 

37.5 

0 

62.5 

0 

0 

0 

C 

0 

0 

0 

100 

0 

0 

D 

4.44 

0 

0 

0 

95.56 

0 

E 

5.37 

0 

0 

0 

0 

94.63 

TRUE 

CLASS 

# 

ESTIMATED  CLASS 
A  B  CD 

E 

# 

3.14 

20.92 

A 

0 

96 

0 

0 

0 

4 

B 

1.04 

0 

98.96 

0 

0 

0 

C 

0 

0 

0 

100 

0 

0 

D 

0 

0 

0 

0 

100 

0 

E 

20.97 

0 

1.33 

0 

.51 

77.19 

u 

1 

* 
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«  «.l  IIS  i.2 

Figure  6:  Signal  “B”. 


The  confusion  matrices  are  shown  in  Tables  5  through 
8. 

4.1  Discussion  of  Results 

The  results,  according  to  Table  4,  shows  that 
wavelet  pre-processing  is  only  marginally  superior  to 
FFT.  This  is  consistent  with  the  results  of  other  stud¬ 
ies  into  this  problem.  In  fact,  a  statistical  analysis 
may  reveal  that  the  difference  is  not  significant.  How¬ 
ever,  an  examination  of  the  confusion  matrices  reveal 
several  distinct  advantages  of  the  wavelet  transform. 

First  of  all,  the  decimated  wavelet  performed  better 
than  the  FFT  based  methods  on  signal  E.  Signal  E  is 
a  long  duration  sinusoid.  It  occurs  at  a  low  frequency 
region  that  coincides  to  a  region  of  high  levels  of  ocean 
noise  (high  relative  to  other  frequency  bands).  The 
time-frequency  plots  shows  that  the  wavelet  transform 
provided  a  sharp,  distinct  features  for  signal  E.  This  is 
because  at  low  frequency,  the  decimated  wavelet  have 
a  large  time  window,  corresponding  to  a  significant 
SNR  improvement  via  processing  gain.  The  only  other 
plots  that  appears  to  be  quite  distinct  for  signal  E  is 
that  of  the  256  point  FFT  and  the  128  point  FFT  with 
3  frame  averaging.  However,  the  decimated  wavelet 
resulted  in  better  performance  on  signal  E  than  the 
256  point  FFT,  because  it  can  integrate  over  a  time 
window  of  168.4  msec’s,  whereas  the  256  point  FFT 
integrates  only  over  10  msec’s. 

In  the  case  of  signal  B,  Fig.  6,  the  wavelet  trans¬ 
form  was  able  to  demonstrate  its  multi-resolution  ad¬ 
vantages.  Signal  B  has  important  features  in  both  its 
gross  and  fine  details.  Since  signal  B  consists  of  two 
pulses  that  are  approximately  30  to  40  msec  apart, 
a  pre-processor  with  a  shorter  width  will  likely  mis¬ 
taken  the  middle  part  of  signal  B  for  noise.  Thus, 
large  FFT’s,  such  as  the  256  point  FFT,  fared  far  bet¬ 
ter  than  the  128  point  FFT  with  single  frame  averag¬ 
ing.  The  solution  to  this  problem  using  FFT  is  to  use 
Isnger  FFT’s;  however,  wavelets  allow  for  classification 
of  long  events,  such  as  signal  B,  without  compromising 
temporal  resolution. 
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Finally,  note  that  the  decimated  wavelet  provided 
the  best  overall  performance.  This  may  be  an  indica¬ 
tion  that  it  h2is  provided  a  succinct,  parsimonious  rep¬ 
resentation  of  the  essential  features  of  this  database. 
We  note  that  the  decimated  wavelet  output  has  only 
28  points.  The  FFT  can  provide  a  small  dimension¬ 
ality  fthe  64  point  FFT,  for  instance,  provides  33 
points),  but  not  without  degradation  in  performance. 

The  results  provided  by  the  tree  wavelet  are  pre¬ 
liminary.  It  seems  that  8  voices  are  not  enough  for 
this  problem. 

5  Conclusions 

The  use  of  the  wavelet  transform  as  a  front  end  of 
an  acoustic  broadband  discriminator  is  advantageous 
under  at  least  two  conditions.  The  first  condition  is  for 
the  recognition  of  a  long  duration  low  frequency  burst 
in  the  ocean,  where  most  of  the  background  noise  is 
also  in  the  low  frequency.  In  this  low  SNR  case,  longer 
wavelet  filters  in  the  low  frequency  provides  higher 
SNR  by  a  processing  gain  (a  stronger  signal  compo¬ 
nent  resulting  from  integrating  the  data  over  a  longer 
period  of  time).  The  second  condition  is  where  multi¬ 
resolution  is  required.  Pulse  trains  emitted  by  dol¬ 
phins,  for  instance,  have  a  fine  feature  for  the  individ¬ 
ual  pulses,  and  a  coarse  feature  characterized  by  the 
frequency  and  duration  of  the  pulse  train.  If  both  fea¬ 
tures  are  important  characteristics,  the  wavelet  trans¬ 
form  provides  a  good  feature  extraction. 

The  experiments  here  also  suggest  that  wavelet 
transforms  may  provide  a  parsimonious  representation 
of  the  signal,  which  can  represent  all  the  essential  fea¬ 
tures  of  a  signal  with  few  parameters.  However,  the 
these  experiments  do  not  imply  that  wavelet  transform 
is  better  for  all  other  databases.  Using  the  converse 
of  the  arguments  made  above,  one  would  guess  that 
wavelet  might  not  perform  as  well  as  FFT  for  a  long 
duration  high  frequency  signal  when  the  amount  of 
noise  is  significant. 
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